A Framework for Efficient Scalable Mining of Rule Variants
نویسندگان
چکیده
Association rule mining is an important data mining problem. Since its inception, different variants of rules has been proposed in the literature. In each case, different attributes (e.g., weight and quantity) are considered to obtain more informative rules. To our knowledge, each proposal is based on the Apriori algorithm that is, in modern context, inefficient. Methods that outperform the Apriori (e.g., FPGrowth and DiffSets) are restricted to the discovery of plain vanilla rules, and does not scale well to mining other variants. In this paper, we present an unifying framework for mining variants of rules that separates the scalability and performance aspects of Apriori-based algorithms from the constraints of mining each specific variant. This framework is easy to instantiate with algorithms proposed to date, and supports new algorithms considering future variants. More importantly, it favors the simplicity of the Apriori algorithm, leverages the performance to that of FP-Growth, and maintains a simple scalability model.
منابع مشابه
Frequent closed itemsets based condensed representations for association rules
After more than one decade of researches on association rule mining, efficient and scalable techniques for the discovery of relevant association rules from large high-dimensional datasets are now available. Most initial studies have focused on the development of theoretical frameworks and efficient algorithms and data structures for association rule mining. However, many applications of associa...
متن کاملA new approach based on data envelopment analysis with double frontiers for ranking the discovered rules from data mining
Data envelopment analysis (DEA) is a relatively new data oriented approach to evaluate performance of a set of peer entities called decision-making units (DMUs) that convert multiple inputs into multiple outputs. Within a relative limited period, DEA has been converted into a strong quantitative and analytical tool to measure and evaluate performance. In an article written by Toloo et al. (2009...
متن کاملScalable Data Mining for Rules
Data Mining is the process of automatic extraction of novel, useful, and understandable patterns in very large databases. High-performance scalable and parallel computing is crucial for ensuring system scalability and interactivity as datasets grow inexorably in size and complexity. This thesis deals with both the algorithmic and systems aspects of scalable and parallel data mining algorithms a...
متن کاملART: A Hybrid Classification Model
This paper presents a new family of decision list induction algorithms based on ideas from the association rule mining context. ART, which stands for ‘Association Rule Tree’, builds decision lists that can be viewed as degenerate, polythetic decision trees. Our method is a generalized “Separate and Conquer” algorithm suitable for Data Mining applications because it makes use of efficient and sc...
متن کاملTowards Efficiently Running Workflow Variants by Automated Extraction of Business Rule Conditions
Efficient workflow variant management is becoming crucial especially for enterprises with a large process landscape. Our research fosters the combination of business rules for adapting reference workflows at runtime and tailoring them to many different situations. A main goal is to optimize the performance of workflow instances w.r.t. different aspects, e.g., branching decisions, throughput tim...
متن کامل